Skip to content

refactor(features): decompose complexity/structure/owners; reduce cfg/cochange/feature-warnings complexity#1237

Merged
carlos-alm merged 12 commits into
mainfrom
refactor/titan-decomposition-features
May 28, 2026
Merged

refactor(features): decompose complexity/structure/owners; reduce cfg/cochange/feature-warnings complexity#1237
carlos-alm merged 12 commits into
mainfrom
refactor/titan-decomposition-features

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

Summary

  • Decomposes complexity, structure, graph-enrichment, structure-query, and owners modules
  • Reduces complexity in cfg and cochange
  • Reduces warning-level complexity in feature warnings batch

Commits

  • bdffbd1: refactor(features): decompose complexity, structure, graph-enrichment, structure-query, and owners
  • 9856136: refactor(features): reduce complexity in cfg and cochange
  • e52f18b: refactor(features): reduce warning-level complexity in feature warnings batch

Context

Part of the Titan Paradigm cleanup pass (see .codegraph/titan/TITAN_REPORT.md). Merge order: this PR is #9 of 10 (mergeOrder position: 9).

Note: Plan listed PR #3 (builder) and PR #5 (graph helpers) as dependencies — PR #3 was skipped due to merge conflict against main. Cherry-pick applied cleanly here, so review separately.

Caveats

  • WASM grammars not available in dev worktree — CI will run full test matrix

Test plan

  • CI passes (lint, build, full test matrix)
  • Verify no new cycles introduced (codegraph stats)

@claude

claude Bot commented May 27, 2026

Copy link
Copy Markdown

Claude finished @carlos-alm's task —— View job


PR Review: Titan Decomposition Features ✅

Reviewing this refactoring PR following the 5-phase review process for codegraph.

Todo List:

  • Phase 1: Root cause analysis - understand the problem being solved
  • Phase 2: Configuration change detection - check for rule relaxations
  • Phase 3: Test integrity - verify no weakened assertions
  • Phase 4: Code quality review - examine refactoring patterns
  • Phase 5: Backlog compliance - rate against foundation principles
  • Provide final assessment and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: This PR is part of the Titan Paradigm cleanup (9/10 in merge order), systematically decomposing large, complex feature modules to improve maintainability and reduce cognitive load. The Titan Paradigm approach uses automated tooling to audit and refactor codebases for "Google-standard shape."

Phase 2: Configuration Change Detection ✅

CLEAN - No configuration changes detected:

  • No changes to biome.json, tsconfig.json, or package.json
  • No lint rule disabling or TypeScript suppressions
  • Only source code changes in src/features/*.ts files

Phase 3: Test Integrity ✅

CLEAN - No test modifications:

  • No test files changed
  • No weakened assertions or removed test cases
  • All test infrastructure remains unchanged

Phase 4: Code Quality Review ✅

Refactoring Pattern Analysis:

Positive patterns observed:

  • Function extraction: Large functions broken into focused helpers (e.g. classifyHalsteadToken, summarizeHalsteadCounts)
  • Clear separation of concerns: Helper functions have single responsibilities with descriptive names
  • Preserved public APIs: All exported functions maintain exact same signatures
  • Consistent style: JSDoc comments added for new internal functions
  • Meaningful names: Functions clearly indicate purpose
  • Type safety: Strong typing maintained throughout

Phase 5: Backlog Compliance ✅

Root Cause Analysis

Problem Being Solved: Systematic complexity reduction as part of Titan Paradigm cleanup - dogfooding codegraph on itself to prove the tool's effectiveness.
Why This Approach: Pure function extraction preserves behavior while reducing cognitive complexity and improving maintainability.
Risk Assessment: Very low risk - structural refactoring with no logic changes and comprehensive test coverage.

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies added
  • Foundation-aligned: ✓ — Improves maintainability without violating any core principles (P1-P8)
  • Problem-fit: 3 — Indirectly helps by improving codebase quality, but doesn't directly address AI token waste or blast radius detection
  • Breaking: No — All public APIs preserved, purely internal restructuring
  • Tier: 1

Critical Concerns

None identified. This is a clean, well-executed refactoring.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Excellent refactoring execution with clear complexity reduction. The systematic approach demonstrates codegraph's effectiveness on itself. No functional changes, preserved APIs, comprehensive test coverage, and follows established patterns. Minor deduction only because this is internal cleanup rather than user-facing functionality.

Recommended next steps: Merge when CI passes. This refactoring should measurably reduce complexity metrics when codegraph analyzes itself.

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Codegraph Impact Analysis

110 functions changed64 callers affected across 33 files

  • collectAllRules in src/features/boundaries.ts:238 (5 transitive callers)
  • loadImportEdges in src/features/boundaries.ts:251 (5 transitive callers)
  • ruleViolated in src/features/boundaries.ts:280 (4 transitive callers)
  • emitEdgeViolations in src/features/boundaries.ts:286 (5 transitive callers)
  • evaluateBoundaries in src/features/boundaries.ts:308 (8 transitive callers)
  • initCfgParsers in src/features/cfg.ts:110 (4 transitive callers)
  • buildVisitorCfgMap in src/features/cfg.ts:193 (3 transitive callers)
  • buildNativeCfgEntry in src/features/cfg.ts:376 (3 transitive callers)
  • tryNativeBulkInsertCfg in src/features/cfg.ts:413 (4 transitive callers)
  • CfgInsertStatements.insertBlock in src/features/cfg.ts:452 (0 transitive callers)
  • CfgInsertStatements.insertEdge in src/features/cfg.ts:453 (0 transitive callers)
  • prepareCfgInsertStatements in src/features/cfg.ts:456 (4 transitive callers)
  • persistAllFileCfgs in src/features/cfg.ts:472 (4 transitive callers)
  • buildCFGData in src/features/cfg.ts:510 (4 transitive callers)
  • pushHunkRanges in src/features/check.ts:28 (4 transitive callers)
  • parseDiffOutput in src/features/check.ts:48 (4 transitive callers)
  • rangesOverlap in src/features/check.ts:119 (3 transitive callers)
  • defEndLine in src/features/check.ts:126 (3 transitive callers)
  • checkMaxBlastRadius in src/features/check.ts:130 (4 transitive callers)
  • makeEmptyCheck in src/features/check.ts:382 (4 transitive callers)

@greptile-apps

greptile-apps Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR is a Titan Paradigm cleanup pass that decomposes six large feature modules (complexity, structure, graph-enrichment, structure-query, owners, boundaries) and reduces cyclomatic complexity in cfg, cochange, check, dataflow, flow, and sequence by extracting focused helper functions — no new behaviour is introduced.

  • Decomposition: Each large export (evaluateBoundaries, buildCFGData, analyzeCoChanges, ownersData, hotspotsData, computeHalsteadMetrics, collectNativeBulkRows) is broken into private helpers; extracted types (NativeRowDecision, CallableNodeRow, OwnersDataResult, BfsFrame, etc.) replace anonymous inline shapes throughout.
  • Incidental fixes: cfg.ts eliminates a duplicate await import('../domain/parser.js') call; check.ts promotes the shared EMPTY_CHECK constant to a factory function, avoiding shared-object mutation risks; check.ts also hoists db.prepare outside the per-file loop for a small performance gain.
  • Correctness: Shared mutable BFS state (visited, cycles, nodeDepths) is passed by reference into FlowBfsFrame / BfsFrame / DataflowBfsState, correctly preserving cross-level accumulation.

Confidence Score: 5/5

Safe to merge — every extracted helper is a pure structural decomposition with no changed execution paths, and the two prior-thread issues (NativeRowDecision naming, owners sort order) are confirmed fixed.

All 12 changed files are straightforward extract-function refactors. Shared mutable BFS state is correctly threaded by reference through the new frame objects, the allNative guard remains at the call site in cfg.ts, and the duplicate parser.js import is eliminated. No logic changes were introduced beyond the incidental fixes in check.ts.

No files require special attention; the single observation in dataflow.ts around processDataflowNeighbor's unused return value is cosmetic and does not affect runtime behaviour.

Important Files Changed

Filename Overview
src/features/boundaries.ts Decomposed evaluateBoundaries into collectAllRules, loadImportEdges, ruleViolated, and emitEdgeViolations; behavior is identical to the original monolithic version.
src/features/cfg.ts Split buildCFGData into buildNativeCfgEntry, tryNativeBulkInsertCfg, prepareCfgInsertStatements, and persistAllFileCfgs; also fixes a redundant double-import of domain/parser.js; allNative guard is correctly preserved at the call site.
src/features/check.ts Extracted pushHunkRanges, rangesOverlap, defEndLine, and DefRow; promoted EMPTY_CHECK constant to a factory function (safer for callers); hoisted db.prepare outside the per-file loop (performance win); behavior unchanged.
src/features/cochange.ts Extracted loadLastAnalyzedSha, clearCoChangeTables, loadKnownFiles, persistCoChangeResults, recomputeJaccardForAffected, and updateCoChangeMeta; the incremental/full logic and DB-close-on-early-error path are faithfully preserved.
src/features/complexity.ts Extracted classifyHalsteadToken, summarizeHalsteadCounts, NativeRowDecision, classifyDefinitionForNativeBulk, buildNativeBulkRow, and collectFileBulkRows; skipTypes early-return and recursive walk logic are correctly preserved in the outer walk closure.
src/features/dataflow.ts Extracted DataflowBfsState, DataflowNeighbor, and processDataflowNeighbor; BFS logic is behaviourally correct, but the boolean return of processDataflowNeighbor is documented as a stop-signal yet is never used by the caller.
src/features/flow.ts Extracted FlowBfsFrame and processFlowCallee; outer mutable collections (visited, cycles, nodeDepths, idToNode) are correctly passed by reference into the frame, preserving accumulation across BFS levels.
src/features/graph-enrichment.ts Split prepareFileLevelData into loadFileLevelEdges, computeFileFanCounts, detectFileCommunities, buildFileVisNode, and selectFileSeedNodes; the previously duplicated 'entry'/default branches in seed selection are correctly collapsed.
src/features/owners.ts Extracted OwnedSymbol, OwnerBoundary, OwnersDataResult, emptyOwnersResult, loadFilteredFiles, buildOwnerIndex, loadSymbolsForFiles, computeOwnerBoundaries, and buildOwnersSummary; prior-thread sort-order fix is confirmed present (sorted spreads stored in boundary.from/to.owners).
src/features/sequence.ts Extracted CalleeNode, BfsFrame, processCallee, DataflowStmts, appendReturnMessages, and annotateCallParams; shared outer collections (messages, fileSet, visited) are passed by reference into the frame so mutations propagate correctly.
src/features/structure-query.ts Extracted HotspotEntry, computeHotspotDensity, mapNativeHotspotRow, mapJsHotspotRow, HOTSPOT_ORDER_BY, and buildHotspotQuery; JS path now lazily prepares a single statement rather than eagerly preparing all four; fallback to fan-in ordering is preserved.
src/features/structure.ts Extracted CallableNodeRow, buildActiveFilesSet, and buildClassifierInput; both classifyNodeRolesFull and classifyNodeRolesIncremental now share these helpers, eliminating the previously duplicated inline logic.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph boundaries.ts
        EB[evaluateBoundaries] --> CAR[collectAllRules]
        EB --> LIE[loadImportEdges]
        EB --> RV[ruleViolated]
        EB --> EEV[emitEdgeViolations]
    end
    subgraph cfg.ts
        BC[buildCFGData] --> TNBI[tryNativeBulkInsertCfg]
        BC --> PCIS[prepareCfgInsertStatements]
        BC --> PAFC[persistAllFileCfgs]
        TNBI --> BNCE[buildNativeCfgEntry]
    end
    subgraph complexity.ts
        CHM[computeHalsteadMetrics] --> CHT[classifyHalsteadToken]
        CHM --> SHC[summarizeHalsteadCounts]
        CNR[collectNativeBulkRows] --> CFB[collectFileBulkRows]
        CFB --> CDNB[classifyDefinitionForNativeBulk]
        CFB --> BNR[buildNativeBulkRow]
    end
    subgraph owners.ts
        OD[ownersData] --> LFF[loadFilteredFiles]
        OD --> BOI[buildOwnerIndex]
        OD --> LSF[loadSymbolsForFiles]
        OD --> COB[computeOwnerBoundaries]
        OD --> BOS[buildOwnersSummary]
    end
    subgraph cochange.ts
        AC[analyzeCoChanges] --> LLAS[loadLastAnalyzedSha]
        AC --> CCT[clearCoChangeTables]
        AC --> LKF[loadKnownFiles]
        AC --> PCR[persistCoChangeResults]
        AC --> RJA[recomputeJaccardForAffected]
        AC --> UCM[updateCoChangeMeta]
    end
Loading

Fix All in Claude Code

Reviews (7): Last reviewed commit: "Merge branch 'main' into refactor/titan-..." | Re-trigger Greptile

Comment on lines +554 to +582
/** Decision outcome for a single definition during native bulk-row collection.
* - 'skip': the definition is legitimately ignorable (non-function, missing line,
* interface stub, unsupported language).
* - 'fallback': a genuine function body is missing precomputed complexity —
* the whole native fast path must abort to JS.
* - 'emit': the definition has complexity data; the row was appended. */
type NativeRowDecision = 'skip' | 'fallback' | 'emit';

/** Classify a definition relative to the native bulk path. Returns
* 'skip' to ignore it, 'fallback' to bail out, or 'emit' if the row was added. */
function classifyDefinitionForNativeBulk(
def: FileSymbols['definitions'][0],
langSupported: boolean,
): 'skip' | 'fallback' | 'has-data' {
if (def.kind !== 'function' && def.kind !== 'method') return 'skip';
if (!def.line) return 'skip';
if (!def.complexity) {
// Interface/type property signatures and single-line stubs are extracted
// as methods but the native engine correctly never assigns complexity.
// Mirror the leniency in initWasmParsersIfNeeded to avoid bailing out
// of the native bulk-insert path for every TypeScript codebase (#846).
if (def.name.includes('.') || !def.endLine || def.endLine <= def.line) return 'skip';
// Languages without complexity rules will never have data — skip them
// rather than bailing out of the entire native bulk path.
if (!langSupported) return 'skip';
return 'fallback'; // genuine function body missing complexity — needs JS fallback
}
return 'has-data';
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent 'has-data' vs 'emit' naming

NativeRowDecision declares the three variants as 'skip' | 'fallback' | 'emit', and its JSDoc explains 'emit' as "the definition has complexity data; the row was appended." However, classifyDefinitionForNativeBulk is typed as returning 'skip' | 'fallback' | 'has-data' and its JSDoc also mis-states the third value as 'emit'. The type NativeRowDecision is never used as the return annotation here, so the two sets of string literals silently diverge. A reader tracing decision === 'has-data' in collectFileBulkRows won't find it in the documented type, making the intent of the abstraction harder to follow.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d46198d — classifyDefinitionForNativeBulk now returns NativeRowDecision directly (the third variant renamed from 'has-data' to 'emit' to match the documented type). The return annotation is now NativeRowDecision instead of the inline 'skip' | 'fallback' | 'has-data' literal, so the type and its callers are fully consistent.

Comment thread src/features/owners.ts
Comment on lines +279 to +295
const srcOwners = matchOwners(e.srcFile, rules);
const tgtOwners = matchOwners(e.tgtFile, rules);
// Cross-boundary: different owner sets (sort for deterministic comparison)
const srcKey = [...srcOwners].sort().join(',');
const tgtKey = [...tgtOwners].sort().join(',');
if (srcKey === tgtKey) continue;
boundaries.push({
from: {
name: e.srcName,
kind: e.srcKind,
file: e.srcFile,
line: e.srcLine,
owners: srcOwners,
},
to: { name: e.tgtName, kind: e.tgtKind, file: e.tgtFile, line: e.tgtLine, owners: tgtOwners },
edgeKind: e.edgeKind,
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent change in owners sort order inside returned boundaries

The original code called srcOwners.sort() (mutating in place) before storing the reference in boundaries.from.owners, so callers always received a sorted array. The refactored computeOwnerBoundaries correctly avoids the mutation by sorting a spread copy — but as a side effect, boundary.from.owners and boundary.to.owners are now returned in the original order from matchOwners rather than lexicographically sorted. For most uses this is the better behavior, but any downstream consumer that relied on the accidentally-sorted order (e.g., deterministic snapshot tests or display logic) will silently see different output. Worth confirming no caller depends on this.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d46198d — computeOwnerBoundaries now creates sortedSrc/sortedTgt via spread+sort before the key comparison, and stores these sorted arrays in boundary.from.owners and boundary.to.owners. This restores the lexicographically-sorted order that the original in-place sort provided, ensuring downstream consumers always receive sorted owner arrays.

…, structure-query,

and owners

Internal refactor — no public API or behaviour change, so docs check acknowledged.

- complexity.ts: split collectNativeBulkRows (cog=70) into classify/build/collect-file
  helpers; extract classifyHalsteadToken + summarizeHalsteadCounts from
  computeHalsteadMetrics.
- structure.ts: merge classifyNodeRolesFull/Incremental DRY via shared
  buildActiveFilesSet + buildClassifierInput helpers.
- graph-enrichment.ts: decompose prepareFileLevelData (cog=32, cyc=26) into
  loadFileLevelEdges, computeFileFanCounts, detectFileCommunities, buildFileVisNode,
  selectFileSeedNodes.
- structure-query.ts: split hotspotsData (cog=34, sloc=102) using a strategy pattern
  (HOTSPOT_ORDER_BY) and mapNativeHotspotRow/mapJsHotspotRow helpers.
- owners.ts: split ownersData (sloc=158, bugs=1.55) into loadFilteredFiles,
  buildOwnerIndex, loadSymbolsForFiles, computeOwnerBoundaries, buildOwnersSummary.
…ry owners

- complexity.ts: classifyDefinitionForNativeBulk now returns NativeRowDecision
  (renamed 'has-data' to 'emit') so the return annotation matches the shared
  union type; eliminates the silent divergence between the type and its callers.
- owners.ts: computeOwnerBoundaries now stores sortedSrc/sortedTgt in boundaries
  rather than the original unsorted arrays, restoring the lexicographic order
  that the original in-place sort provided.
@carlos-alm carlos-alm force-pushed the refactor/titan-decomposition-features branch from 496a718 to d46198d Compare May 27, 2026 05:21
@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

- cfg.ts: consolidate double dynamic import of ../domain/parser.js into
  a single `mod` variable used for both createParsers and getParser
- cfg.ts: type cfgRules as CfgRulesConfig in buildVisitorCfgMap, removing
  the `unknown` annotation and the downstream `as { functionNodes: string[] }` cast
- check.ts: replace mutable module-level EMPTY_CHECK constant with a
  makeEmptyCheck() factory to prevent shared-state corruption
carlos-alm added a commit that referenced this pull request May 27, 2026
- cfg.ts: consolidate double dynamic import of ../domain/parser.js into
  a single `mod` variable used for both createParsers and getParser
- cfg.ts: type cfgRules as CfgRulesConfig in buildVisitorCfgMap, removing
  the `unknown` annotation and the downstream `as { functionNodes: string[] }` cast
- check.ts: replace mutable module-level EMPTY_CHECK constant with a
  makeEmptyCheck() factory to prevent shared-state corruption
@carlos-alm

Copy link
Copy Markdown
Contributor Author

Addressed the three issues from Greptile's summary review (commit e549551):

  1. Double import in cfg.ts — consolidated the two import('../domain/parser.js') calls in initCfgParsers into a single mod variable used for both createParsers and getParser.

  2. cfgRules typed as unknown in buildVisitorCfgMap — imported CfgRulesConfig from ../types.js and typed the parameter correctly; removed the downstream as { functionNodes: string[] } cast (the field is already Set<string> on CfgRulesConfig).

  3. Mutable module-level EMPTY_CHECK constant — converted to a makeEmptyCheck() factory function so each return path gets a fresh object, preventing shared-state corruption if any caller were to mutate the result.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

carlos-alm and others added 4 commits May 27, 2026 01:26
…, structure-query, and owners

Internal refactor — no public API or behaviour change, so docs check acknowledged.

- complexity.ts: split collectNativeBulkRows (cog=70) into classify/build/collect-file
  helpers; extract classifyHalsteadToken + summarizeHalsteadCounts from
  computeHalsteadMetrics.
- structure.ts: merge classifyNodeRolesFull/Incremental DRY via shared
  buildActiveFilesSet + buildClassifierInput helpers.
- graph-enrichment.ts: decompose prepareFileLevelData (cog=32, cyc=26) into
  loadFileLevelEdges, computeFileFanCounts, detectFileCommunities, buildFileVisNode,
  selectFileSeedNodes.
- structure-query.ts: split hotspotsData (cog=34, sloc=102) using a strategy pattern
  (HOTSPOT_ORDER_BY) and mapNativeHotspotRow/mapJsHotspotRow helpers.
- owners.ts: split ownersData (sloc=158, bugs=1.55) into loadFilteredFiles,
  buildOwnerIndex, loadSymbolsForFiles, computeOwnerBoundaries, buildOwnersSummary.
@carlos-alm carlos-alm force-pushed the refactor/titan-decomposition-features branch from e3a022e to ce7f68c Compare May 28, 2026 04:37
@carlos-alm carlos-alm merged commit e5a56be into main May 28, 2026
21 checks passed
@carlos-alm carlos-alm deleted the refactor/titan-decomposition-features branch May 28, 2026 05:08
@github-actions github-actions Bot locked and limited conversation to collaborators May 28, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant